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Analytical results are derived for the bond percolation threshold and the size of the giant con- 
nected component in a class of random networks with non-zero clustering. The network's degree 
distribution and clustering spectrum may be prescribed, and theoretical results match well to nu- 
merical simulations on both synthetic and real-world networks. 
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Random network models have been extensively studied 
with a view to gaining insight into the structure and dy- 
namics of many social, technological, and biological net- 
works However, most analytical approaches rely 
on tree-like approximations of the local network struc- 
ture and thus neglect the presence of short loops (cy- 
cles) in the graphs. The local clustering coefficient for a 
node A is defined as the fraction of pairs of neighbors of 
node A which are also neighbors of each other 4], and 
is typically non- negligible in real- world networks. The 
degree- dependent clustering or clustering spectrum Ck is 
the average of the local clustering coefficient over the 
class of all nodes of degree k [1, The question of 
how network models with non-zero (taken, for exam- 
ple, from real-world network data) differ from randomly- 
wired (configuration-model) networks with the same de- 
gree distribution Pk is of considerable interest. 

The bond percolation problem for a network may be 
stated as follows: each edge of the network graph is vis- 
ited once, and damaged (deleted) with probability 1 — p. 
The quantity p is the bond occupation probability and 
the non-damaged edges are termed occupied. The size 
of the giant connected component (GCC) of the graph 
becomes nonzero at some critical value of p > 0: this 
critical value of p is termed the bond percolation thresh- 
old pth- The bond percolation problem has applications 
in epidemiology, where p is related to the average trans- 
missibility of a disease and the GCC represents the size 
of an epidemic outbreak [3, [^, and in the analysis of 
technological networks, where the resilience of a network 
to the random failure of links is quantified by the size 
of the GCC Analytical solutions for percolation on 
randomly-wired networks and on correlated networks are 
well-known [10, 11, 12, 13], but these cases have zero clus- 
tering in the limit of infinite network size. 

In this paper we introduce a class of networks with non- 
zero clustering, and demonstrate analytical solutions for 
the GGG size and the bond percolation threshold. Most 
previous studies of clustering effects on percolation rely 
on numerical simulations using various algorithms to gen- 
erate clustered networks, e.g. [14, 15, 16]. Analytical so- 
lutions were found by Newman [17| for a bipartite graph 
model of highly clustered networks. However, the bipar- 



FIG. 1: (a) Segment of a clustered random network; (b) 
split into disjoint cliques, with external links emphasized; (c) 
corresponding super-nodes. 



tite graph model (in contrast to the model discussed here) 
is not amenable to fitting to a prescribed degree distri- 
bution Pk. The bipartite graph model of Guillaume and 
Latapy (IS] may be fitted to real-world data but their 
networks do not permit analytical solution of the perco- 
lation problem. Serrano and Boguha [l^] also obtain 
approximate analytical solutions, but only for weak clus- 
tering cases with < l/(fc— 1). Trapman 20] introduced 
a model of clustering in structured graphs based on em- 
bedding cliques (complete subgraphs) within a random 
tree structure. We show below that this model, and its 
generalization [2l| are in fact special cases of the model 
presented here. In a recent paper (22i] . Newman intro- 
duced a triangle-based model of clustered networks which 
may be seen as complementary to the model presented 
here: we discuss this model in detail at the end of the 
paper. 

We consider random networks in which each node may 
be part of a single clique (a fully-connected subgraph). 
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Figure [Ha) shows a segment of such a network which 
contains one 3-chque (triangle), one 4-chque, and a sin- 
gle node which is not a member of a clique (for nota- 
tional convenience we will refer to such individual nodes 
as members of a 1-clique). Nodes which are members of a 
c-clique have c — 1 edges linking them to neighbors within 
the same clique. They also have an additional fc — c -I- 1 
neighbors who are not in the same clique as themselves, 
where k is the node degree (with k > c — 1). Edges 
which are not internal to a clique are termed external 
links. In Fig[ljb) the external links are highlighted with 
thick lines, but for the purposes of the bond percolation 
problem they are indistinguishable from clique edges. In 
networks of this type each node is a member of at most 
one clique, and so the network can be decomposed into 
disjoint cliques which are linked together by the set of ex- 
ternal links, see Fig.[Hb) [27j. If each chque is regarded 
as a super-node (Fig. [ijc)) then realizations of the ran- 
dom network may be generated by connecting together 
randomly chosen pairs of the external link stubs, as in 
the configuration model for standard random networks 

The fundamental quantity describing networks of this 
type is the joint probability distribution ^{k,c), giving 
the probability that a randomly-chosen node in the net- 
work has degree k and is a member of a c-clique. Note 
7(fc,c) = for fc < c — 1, i.e., fe-degree nodes can only 
be members of c-cliques if their degree is high enough to 
provide links to all c— 1 clique neighbors. The degree dis- 
tribution Pk of the network (probability that a random 
node has k neighbors) is obtained from 7 by averaging 
over all cliques: 



fc+i 

Pfc=^7(fc,c) 

c=l 



7(fc,c). 



(1) 



A node chosen at random from the set of all fc- 
degree nodes is a member of a c-clique with probabil- 
ity 7(fc,c)/Pfc. As a member of a c-clique, it is part of 
c - l\ 

triangles, and so its local clustering coefRcient 
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c- 1 



/ ( 2 ) • Therefore the degree-dependent clus- 



tering coefficient Ck is given in terms of 7 by 
7(fc,c) (c-l)(c-2) 



Ck = 



fc(fc- 1) 



(2) 



The class of networks described by the joint pdf 7(fc, c) 
includes the well-studied configuration model [I4I, for 
which 7(fc, c) — dciPk- This limit contains no cliques, and 
hence the clustering (in the infinite network size limit) 
vanishes. Also contained within the class of networks is 
the Trapman model [l^, 21 1 in which a fraction fk of 
fc-degree nodes form cliques of precisely k nodes, giving 
7(fc, c) = Sci{l - fk)Pk + SckfkPk- 



level n 



level n-1 




FIG. 2: Tree diagram for updating the state of node B. 



To determine the expected size of the GCC in the dam- 
aged network we choose a random node A of the network 
and approximate the network as a tree structure, with the 
node A at the top (root) of the tree. Each level of the 
tree structure (see Fig. [5]) is accessed from the level above 
by traversing one external link. If a node is part of a c- 
clique, the remaining c — 1 clique neighbors are shown at 
an intermediate level. Because the graph of super-nodes 
(Fig. [TJc)) is connected using the configuration model, 
this tree structure is a locally accurate approximation to 
the original network is the limit of infinite system size. 

To calculate the probability that node A is part of the 
GCC, we apply a tree-based approach which is general- 
izable to a variety of cascade dynamics on networks 
and is related to work on the random field Ising model 
2J] . We label nodes which are part of a connected com- 
ponent as active with the remaining nodes termed inac- 
tive. All nodes of the tree are initially considered inac- 
tive, and we examine the propagation of the active state 
upwards through the tree (from leaves to root) as an in- 
fection process beginning from an infinitesimally small 
fraction of active nodes infinitely deep in the tree. Con- 
sider a node at level n, e.g. node B in Fig. [2] Initially 
B (and its parent at level n -|- 1) is inactive, but suppose 
nodes at level n — 1 are active with probability q. li B has 
degree fc, and is a member of a c-clique (e.g. k — 6 and 
c = 4 in Fig. [5]), it has fc— c-|-l external links, one of which 
necessarily leads to its parent at level n -I- 1. The node B 
will become active if any one of its fc — c externally- linked 
children at level n — 1 is active, provided that an occupied 
edge joins that child to B; thus the probability that B 
is not activated in this fashion is (1 — pqY~'^. The other 
mechanism whereby B may be activated is via its neigh- 
bors in the c-clique; writing Qc for the probability that 
the top-node (such as B) of a c-clique is activated by its 
clique neighbors, we have the total probability of activa- 
tion for Boil- {l-pqf-'^il - Q^). The probability Qc 
is calculated using the P{m\k) functions introduced and 
tabulated in [l7| . which are polynomials in p giving the 
probability that a randomly chosen node in a damaged 
(i.e., taking into account bond percolation) c-clique be- 
longs to a connected cluster of m nodes (including itself) 
within the clique. Since node B is activated if any one of 
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its m — 1 connected neighbors is active, we have 



Q^^J2p{m\c){l-{l-qJ 



(3) 



where is the probabiUty of a c-chque member at the 
intermediate level being activated by his level- (n — 1) 
children: 



E 



fe'-c+l 



(4) 



Here 7(fc', c)/^^„ 7(fc", c) is the degree distribution of 
nodes which are members of cliques of size c, and the 
remaining term is the probability that a /c'-degree node 
in a c-clique is activated by one of its fc' — c + 1 children 
at level n — 1. 

Given g, we can therefore calculate, using equations ([3|) 
and (HI), the probability of B becoming active. To close 
the system of equations, we consider the parent of B at 
level n + 1, for whom q is the probability that one of its 
children is active. Since node B has fc— c+l external links 
in total, the probability of it being a child of a random 
level-(n + 1) node is Ukc = (fc — c + l)7(fc,c)/ze, where 
Ze = J2k' c'(^' ^ C + l)7(fc', c') is the average number of 
external links per node. Combining the equations above 
gives the closure relation 

q = J2^kc{l-{l~ pq)''-'{l - Qc)) ^ Giq). (5) 

k,c 

Equations ([111"® are solved by iterating from an in- 
finitcsimally small value of <? to a steady-state solution: 
this determines the probability (in an infinite network) 
that a node is active, conditional on its parent being in- 
active. The final calculation of the GCC size considers 
the node A at the top (root) of the tree: with probability 
^{k,c) this has fc — c -I- 1 direct links to the level below. 
By similar arguments to before, node A is active with 
probability 

S = ^7(ft, c) (1 - (1 - pqf-'+\l - Qc)) , (6) 

fe,c 

where q is the solution of equations (I3|)-(l5|) and S is the 
expected fractional size of the GCC. 

Note that if we set 7(fc, c) = SciPk, equations (O and 
(|6]) reduce to their well-studied configuration model ver- 
sio ns ( using Qi = 0). The bond percolation results of 
2oL m for the generalized Trapman model are also a spe- 
cial case of equations Also of interest is the GCC 
size in an undamaged network — this is obtained from our 
equations by setting p = 1 (for which P{m\c) = Smc)- 

The bond percolation threshold is the value of p at 
which the GCC size first becomes nonzero. This may be 
determined from the cascade condition 21, 2^ G'(0) = 



1, where G{q) is defined in equation ([5]). The resulting 
polynomial equation for p may be written in the form 



k.c 



-c+l)7(fc, c) (p(fc - c) + (z, - c + l)Dcip)) 



where Dc{p) = Pj2m=ii''^ ~ 1)-P("i|c) (see [21 



(7) 
and Zc 

is the average degree of nodes in cliques of size c: Zc = 
Ekkl{k,c)/j:,,j{k',c). 

We now describe an algorithm for generating realiza- 
tions of random networks with a prescribed distribution 
"/(k,c). For a large number N (which is related to the 
number N of nodes in the final network, see below) , we 
choose N random numbers c,; (z = 1 to N) with pdf 



{Eklik,c)/c)/(j:k,c'lik,c')/c' 



to be the clique sizes 
in the network realization. For each a, we create Ci 
nodes in a complete subgraph, and assign their degrees 
kj (j = 1 to Ci) by drawing random numbers from a dis- 
tribution with density j(k, Ci)/ '^f,, "f{k' , Ci). Node j in 



clique i then has fc,- 



1 external link stubs associated 



with it. Having created all N cliques in this fashion, we 
randomly choose pairs of external link stubs and connect 
them together to create the random network (c.f. Fig.[T]). 
The expected number of nodes in a network generated 
using this algorithm is = N/ (^f. ^j{k,c)/cj, which 

allows us to estimate the value of N needed to produce 
a final network of size N. For finite-sized networks, the 
presence of cliques means this algorithm is not guaran- 
teed to give exactly N nodes in the final network, but 
in practice we find the variation in the network size is 
negligibly small for sufficiently large TV. 

Figure [3ja) shows a comparison between GCC sizes 
from theory (from equations (O-®) and numerical sim- 
ulations on networks with the Poisson degree distribu- 
tion Pfc = z'^e~^/fc! and mean degree z = 3. We create 
non-zero clustering in the networks by inserting 3-cliques 
(triangles) and 4-cliques; specifically, we set j{k,c) = 
((1 - a - (3)Sci + aSc3 + I35ca) Pk for fc > 3. This em- 
beds a fraction a and (3 of fc-degree nodes in 3-cliques and 
4-cliques, respectively, with the remainder as individuals 
(i.e, 1-cliques). Since nodes of degree fc cannot be part 
of c-cliques when c exceeds fc -I- 1, we deal with nodes of 
degree fc < 3 as follows: 7(2, c) = ((1 — a)5ci + aScs.) P2, 
and 7(fc, c) = PkSci for fc = or 1. The case a — P — 
gives the standard configuration model network, with 
zero clustering. We also show results for a = 0.8, /? = 0.1 
and for a = 0, (3 — 1. Using the first of these cor- 
responds to a global clustering coefficient C = J^k-^kCk 
of 0.31, while the second case, which contains only 4- 
cliques, has C ~ 0.35. The corresponding bond percola- 
tion thresholds may be calculated from the polynomial 
equation ([7|) using (see (lH) D^^p) = Ip^ (\ + p — p^) and 
Di{p) = 3p'^{l + 2p-lp^ + lp^-2p^). The resulting val- 
ues are pth = 0.349 and pth = 0.422, both exceeding the 
configuration model value of pth = ^jz = 1/3 [13, HI- 
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• a=0.0, p=0.0 




■ a=0.8, p=0.1 




T a=0.0, p=1.0 




FIG. 3: (Color online) Size of giant connected component 
as a function of bond occupation probability for (a) synthetic 
networks with Poisson degree distribution; (b) pretty-good- 
privacy network, for p near pth', (c) as (b), but for all p values. 



Numerical simulation results on networks of size N = 10^ 
are shown by the symbols, while the curves are the the- 
oretical predictions of equations ([S])-®. The agreement 
between theory and numerics is excellent. 

One of the motivations for the introduction of the 7 
networks is the ability to obtain analytical results for 
networks with given degree distribution Pk and cluster- 
ing spectrum Cfc. Equations ([1]) and ^ constrain the 
distribution 7(fc, c) to fit a desired Pk and Cfc, which may 
be measured, for example, in a real-world network. How- 
ever, these constraints still permit significant freedom in 
choosing 7(fc,c). It is convenient therefore to consider a 
parametrization of 7(fc, c) which allows straightforward 
fitting to given network data. We suppose that the dis- 
tribution of clique-sizes c occupied by nodes of degree k 
is given by a binomial distribution, defining 



l{k,c) = Pk 



k 

c- I 



9k 



c+1 



(8) 



for c = I to fc -I- 1. This distribution clearly satisfies llj, 
and it distributes the probability mass corresponding to 
the fc-degree nodes over the c-clique sizes via the single 
parameter gk- The relationship between the parameters 
gk and the clustering spectrum Ck is remarkably simple; 
substituting the parametrization ([8]) into ([2]) yields Ck — 
g\. Thus the form ([8]) for 7 may immediately be fitted to 
the Pk and Ck of a real- world dataset by setting gk — \/ck- 
Figures [2Ib) and [3]Jc) show the results of applying the 
parametrization ([5]) to match the degree distribution and 



clustering spectrum of the connected component of the 
pretty-good-privacy (PGP) network [2^. The PGP net- 
work is highly clustered, with Ck > l/(/c — 1) for most 
k [§]. Numerical calculations of the GCC size in this 
network are shown by the symbols on Fig. E^b) and 
[3fc); also shown arc the theoretical predictions for the 
zero-clustering (configuration model) case and the results 
of equations (I3|)-(l6|) with parametrization The ef- 
fects of clustering on the percolation threshold are well- 
captured by the 7 theory, see Fig. [3ljb). Note that here, 
in contrast to the example in Fig. ^a), clustering acts 
to decrease the percolation threshold: equation ^ gives 
Pth = 0.0236, which is less than half the configuration 
model value of 0.0559. The 7 theory gives quite a good 
approximation to the actual GCC size for bond occupa- 
tion probabilities of p up to about 0.5 (Fig.[3]^c)); however 
the behavior at larger p values is less accurate, with the 
predicted GCC in the undamaged {p = 1) network being 
substantially smaller than its true value. This inaccuracy 
may be attributable to excessive clustering being induced 
by the parametrization ([5]) for the large p case. 

In summary, we have introduced a class of clustered 
random networks with arbitrary degree distribution and 
clustering spectrum, and analytically determined the size 
of the GCC and the bond percolation threshold. Nu- 
merically generated networks show excellent agreement 
with the theoretical results, and we have demonstrated 
the applicability of the theory by fitting to the pretty- 
good-privacy network to produce accurate predictions 
of the GCC size for small p. We have used a cascade- 
based approach here in preference to a generating func- 
tion method, because (as we show in a subsequent paper) 
this approach generalizes to give analytical results for k- 
core sizes. Watts' threshold decision model, and other 
cascading dynamics on clustered networks [25] . 

It is instructive to compare our 7-theory networks with 
the clustered network model recently introduced by New- 
man (2^ . In his model, a fe-degree node may be a member 
of up to k/2 disjoint triangles (3-cliques), and thus have 
a local clustering coefficient of up to l/(fc — 1). In con- 
trast, nodes in the 7-theory networks can be members 
of only a single clique, but using large cliques can give 
arbitrarily high clustering. The restriction < l/(fc— 1) 
imposed on Newman's model networks inhibits a direct 
fit to most real-world networks, in contrast to our results 
in Fig. [21 It would be interesting to explore the possibil- 
ity of modelling networks with multiple cliques per node 
(as in [221 ) while allowing the cliques to be larger than 
triangles (as here). Indeed, a general model of this type 
is proposed in [26[ but it seems unlikely that easily com- 
putable analytical solutions, as found here and in [2^ . 
can be obtained in this more general setting. 

Discussions with Sergey Melnik, Adam Hackett and 
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We differentiate between external Itnks and 2-cliques as 
follows. An external link joins together two nodes, each 
of which may be part of its own clique, e.g., at the top 
of Figllja) an external link joins a 3-clique node to a 4- 
clique node. A 2-clique is also an edge joining two nodes, 
but because these nodes are in the 2-clique they cannot be 
part of any other clique and so can link to the remainder 
of the network only through external links. 



